Scalable Sentiment Classification for Big Data Analysis Using Naı̈ve Bayes Classifier

نویسندگان

  • Bingwei Liu
  • Erik Blasch
  • Yu Chen
  • Dan Shen
  • Genshe Chen
چکیده

A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this paper, we aim to evaluate the scalability of Naı̈ve Bayes classifier (NBC) in large datasets. Instead of using a standard library (e.g., Mahout), we implemented NBC to achieve fine-grain control of the analysis procedure. A Big Data analyzing system is also design for this study. The result is encouraging in that the accuracy of NBC is improved and approaches 82% when the dataset size increases. We have demonstrated that NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput. Keywords—Cloud computing, Big data, Polarity mining, sentiment classification

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier

A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this pape...

متن کامل

An empirical study of sentiment analysis for chinese documents

Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naı̈ve Bayes and SVM) are invest...

متن کامل

Applying Naive Bayes Classification to Google Play Apps Categorization

There are over one million apps on Google Play Store and over half a million publishers. Having such a huge number of apps and developers can pose a challenge to app users and new publishers on the store. Discovering apps can be challenging if apps are not correctly published in the right category, and, in turn, reduce earnings for app developers. Additionally, with over 41 categories on Google...

متن کامل

Language-Independent Twitter Sentiment Analysis

Millions of tweets posted daily contain opinions and sentiment of users in a variety of languages. Sentiment classification can benefit companies by providing data for analyzing customer feedback for products or conducting market research. Sentiment classifiers need to be able to handle tweets in multiple languages to cover a larger portion of the available tweets. Traditional classifiers are h...

متن کامل

Sentiment Analysis and Deep Learning: A Survey

Deep learning has an edge over the traditional machine learning algorithms, like SVM and Naı̈ve Bayes, for sentiment analysis because of its potential to overcome the challenges faced by sentiment analysis and handle the diversities involved, without the expensive demand for manual feature engineering. Deep learning models promise one thing given sufficient amount of data and sufficient amount o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013